{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Classification\n", "\n", "The goal of classification is to find a function that separates the data into positive/negative labels. In the case of a linear classifier, this reduces to finding a set of parameters $w^\\star$ such that, $$ \\begin{align} w^\\star &= \\arg \\min_w \\sum_{i=1}^{N} \\left[y_i\\neq \\text{sign} (w^\\top x_i) \\right] \\\\ &= \\arg \\min_w \\sum_{i=1}^{N} l_{0/1} (w; x_i, y_i) \\end{align}.$$ \n", "\n", "The problem with the $l_{0/1}$ loss, is that it is non-convex (and non-differentiable), hence other surrogate losses must be used to optimize the number of points. " ] }, { "cell_type": "code", "execution_count": null, "metadata": { "scrolled": true }, "outputs": [], "source": [ "%matplotlib inline\n", "%load_ext autoreload\n", "%autoreload 2\n", "import ipywidgets\n", "from ipywidgets import interact, interactive, interact_manual\n", "import IPython\n", "from matplotlib import rcParams\n", "rcParams['figure.figsize'] = (10, 5)\n", "rcParams['font.size'] = 16\n", "\n", "import numpy as np\n", "import matplotlib.pyplot as plt\n", "from utilities.util import gradient_descent\n", "from utilities.load_data import linear_separable_data, circular_separable_data\n", "from utilities import plot_helpers \n", "from utilities.classifiers import Perceptron, SVM, Logistic\n", "from utilities.regularizers import L1Regularizer, L2Regularizer" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "scrolled": true }, "outputs": [], "source": [ "rcParams['figure.figsize'] = (10, 5)\n", "rcParams['font.size'] = 16\n", "\n", "num_points = 100 # Number of points per class\n", "noise = 0.2 # Noise Level (needed for data generation).\n", "TEST_FRACTION = .80\n", "np.random.seed(42)\n", "X, Y = linear_separable_data(num_points, noise=noise, dim=2)\n", "\n", "fig = plt.subplot(111)\n", "opt = {'marker': 'ro', 'label': '+', 'size': 8}\n", "plot_helpers.plot_data(X[np.where(Y == 1)[0], 0], X[np.where(Y == 1)[0], 1], fig=fig, options=opt)\n", "opt = {'marker': 'bs', 'label': '-', 'x_label': '$x$', 'y_label': '$y$', 'size': 8, 'legend': True}\n", "plot_helpers.plot_data(X[np.where(Y == -1)[0], 0], X[np.where(Y == -1)[0], 1], fig=fig, options=opt)\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "scrolled": true }, "outputs": [], "source": [ "rcParams['figure.figsize'] = (10, 5)\n", "rcParams['font.size'] = 16\n", "\n", "# Separate into train and test sets!\n", "indexes = np.arange(0, 2 * num_points, 1)\n", "np.random.shuffle(indexes)\n", "num_train = int(np.ceil(2 * TEST_FRACTION * num_points))\n", "\n", "X_train = X[indexes[:num_train]]\n", "Y_train = Y[indexes[:num_train]]\n", "\n", "X_test = X[indexes[num_train:]]\n", "Y_test = Y[indexes[num_train:]]\n", "\n", "fig = plt.subplot(111)\n", "\n", "opt = {'marker': 'ro', 'fillstyle': 'full', 'label': '+ Train', 'size': 8}\n", "plot_helpers.plot_data(X_train[np.where(Y_train == 1)[0], 0], X_train[np.where(Y_train == 1)[0], 1], fig=fig, options=opt)\n", "opt = {'marker': 'bs', 'fillstyle': 'full', 'label': '- Train', 'size': 8}\n", "plot_helpers.plot_data(X_train[np.where(Y_train == -1)[0], 0], X_train[np.where(Y_train == -1)[0], 1], fig=fig, options=opt)\n", "\n", "opt = {'marker': 'ro', 'fillstyle': 'none', 'label': '+ Test', 'size': 8}\n", "plot_helpers.plot_data(X_test[np.where(Y_test == 1)[0], 0], X_test[np.where(Y_test == 1)[0], 1], fig=fig, options=opt)\n", "opt = {'marker': 'bs', 'fillstyle': 'none', 'label': '- Test', 'size': 8, \n", " 'x_label': '$x$', 'y_label': '$y$', 'legend': True}\n", "plot_helpers.plot_data(X_test[np.where(Y_test == -1)[0], 0], X_test[np.where(Y_test == -1)[0], 1], fig=fig, options=opt)\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### The Perceptron Algorithm\n", "\n", "The perceptron loss is defined as: $$L(w; X, Y) = \\sum_{i=1}^{N} L_p(w; x_i, y_i) = \\sum_{i=1}^{N} \\max \\{ 0, -y_i w^\\top x_i \\}.$$\n", "\n", "The loss function is continuous, but not differentialbe at $y_i w^\\top x_i=0$. The subgradient, however, exists and hence (stochastic) gradient descent converges. The subgradient is:\n", "\n", "$$ \\partial L_p(w; x_i,y_i) = \\left\\{\\begin{array}{cc} 0 & \\text{if } -y_i w^\\top x_i < 0 \\\\ -y_i x_i & \\text{if } -y_i w^\\top x_i > 0 \\\\ \\left[0, -y_i x_i \\right] & \\text{if } -y_i w^\\top x_i = 0 \\end{array} \\right. $$" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "scrolled": true }, "outputs": [], "source": [ "rcParams['figure.figsize'] = (20, 5)\n", "rcParams['font.size'] = 16\n", "\n", "n_iter_widget = ipywidgets.IntSlider(value=20, min=5, max=100, step=1,\n", " description='Number of iterations:', style={'description_width': 'initial'},\n", " continuous_update=False)\n", "batch_size_widget = ipywidgets.IntSlider(value=1, min=1, max=X_train.shape[0], step=1,\n", " description='Batch Size:', style={'description_width': 'initial'},\n", " continuous_update=False)\n", "noise_widget = ipywidgets.FloatSlider(value=0.2, min=0, max=1, step=0.1, readout_format='.1f',\n", " description='Noise:', style={'description_width': 'initial'},\n", " continuous_update=False)\n", "\n", "def change_learning_params(n_iter, batch_size, noise):\n", " np.random.seed(42)\n", " X, Y = linear_separable_data(num_points, noise=noise, dim=2)\n", " indexes = np.arange(0, 2 * num_points, 1)\n", " np.random.shuffle(indexes)\n", " num_train = int(np.ceil(2 * TEST_FRACTION * num_points))\n", "\n", " X_train = X[indexes[:num_train]]\n", " Y_train = Y[indexes[:num_train]]\n", "\n", " X_test = X[indexes[num_train:]]\n", " Y_test = Y[indexes[num_train:]]\n", "\n", " classifier = Perceptron(X_train, Y_train)\n", " classifier.load_test_data(X_test, Y_test)\n", " \n", " np.random.seed(42)\n", " w0 = np.random.randn(3, )\n", "\n", " opts = {'eta0': 1,\n", " 'n_iter': n_iter,\n", " 'batch_size': batch_size,\n", " 'n_samples': X_train.shape[0],\n", " 'algorithm': 'SGD',\n", " 'learning_rate_scheduling': None,\n", " }\n", " try:\n", " trajectory, indexes = gradient_descent(w0, classifier, opts=opts)\n", "\n", " contour_plot = plt.subplot(121)\n", " error_plot = plt.subplot(122)\n", "\n", " opt = {'marker': 'ro', 'fillstyle': 'full', 'label': '+ Train', 'size': 8}\n", " plot_helpers.plot_data(X_train[np.where(Y_train == 1)[0], 0], X_train[np.where(Y_train == 1)[0], 1], fig=contour_plot, options=opt)\n", " opt = {'marker': 'bs', 'fillstyle': 'full', 'label': '- Train', 'size': 8}\n", " plot_helpers.plot_data(X_train[np.where(Y_train == -1)[0], 0], X_train[np.where(Y_train == -1)[0], 1], fig=contour_plot, options=opt)\n", "\n", " opt = {'marker': 'ro', 'fillstyle': 'none', 'label': '+ Test', 'size': 8}\n", " plot_helpers.plot_data(X_test[np.where(Y_test == 1)[0], 0], X_test[np.where(Y_test == 1)[0], 1], fig=contour_plot, options=opt)\n", " opt = {'marker': 'bs', 'fillstyle': 'none', 'label': '- Test', 'size': 8}\n", " plot_helpers.plot_data(X_test[np.where(Y_test == -1)[0], 0], X_test[np.where(Y_test == -1)[0], 1], fig=contour_plot, options=opt)\n", "\n", " contour_opts = {'n_points': 50, 'x_label': '$x$', 'y_label': '$y$', 'sgd_point': True, 'n_classes': 4}\n", " error_opts = {'epoch': 5, 'x_label': '$t$', 'y_label': 'error'}\n", "\n", " opts = {'contour_opts': contour_opts, 'error_opts': error_opts}\n", " plot_helpers.classification_progression(X, Y, trajectory, indexes, classifier, \n", " contour_plot=contour_plot, error_plot=error_plot, \n", " options=opts)\n", " except KeyboardInterrupt:\n", " pass\n", " \n", "interact_manual(change_learning_params, n_iter=n_iter_widget, batch_size=batch_size_widget,\n", " noise=noise_widget);" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## The SVM Algorithm\n", "\n", "The svm loss is defined as: $$L(w; X, Y) = \\sum_{i=1}^{N} L_{\\text{svm}} (w; x_i, y_i) = \\sum_{i=1}^{N} \\max \\{ 0, 1-y_i w^\\top x_i \\}.$$\n", "\n", "The loss function is continuous, but not differentialbe at $y_i w^\\top x_i=0$. The subgradient, however, exists and hence (stochastic) gradient descent converges. The subgradient is:\n", "\n", "$$ \\partial L_{\\text{svm}}(w;x_i,y_i) = \\left\\{\\begin{array}{cc} 0 & \\text{if } 1-y_i w^\\top x_i < 0 \\\\ -y_i x_i & \\text{if } 1-y_i w^\\top x_i > 0 \\\\ \\left[0, -y_i x_i \\right] & \\text{if } 1-y_i w^\\top x_i = 0 \\end{array} \\right. $$\n", "\n", "The difference with the perceptron loss is that the SVM loss includes a loss margin. " ] }, { "cell_type": "code", "execution_count": null, "metadata": { "scrolled": false }, "outputs": [], "source": [ "rcParams['figure.figsize'] = (10, 5)\n", "rcParams['font.size'] = 16\n", "reg_widget = ipywidgets.FloatSlider(value=-6, min=-6, max=3, step=0.5, readout_format='.1f',\n", " description='Regularization 10^:', style={'description_width': 'initial'},\n", " continuous_update=False)\n", "lr_widget = ipywidgets.FloatSlider(value=1, min=1e-1, max=2, step=1 * 1e-1, readout_format='.1f', \n", " description='Learning rate:', style={'description_width': 'initial'},\n", " continuous_update=False)\n", "n_iter_widget = ipywidgets.IntSlider(value=20, min=5, max=100, step=1,\n", " description='Number of iterations:', style={'description_width': 'initial'},\n", " continuous_update=False)\n", "\n", "batch_size_widget = ipywidgets.IntSlider(value=1, min=1, max=X_train.shape[0], step=1,\n", " description='Batch Size:', style={'description_width': 'initial'},\n", " continuous_update=False)\n", "noise_widget = ipywidgets.FloatSlider(value=0.2, min=0, max=1, step=0.1, readout_format='.1f',\n", " description='Noise:', style={'description_width': 'initial'},\n", " continuous_update=False)\n", "\n", "def change_learning_params(reg, eta0, n_iter, batch_size, noise):\n", " np.random.seed(42)\n", " X, Y = linear_separable_data(num_points, noise=noise, dim=2)\n", " indexes = np.arange(0, 2 * num_points, 1)\n", " np.random.shuffle(indexes)\n", " num_train = int(np.ceil(2 * TEST_FRACTION * num_points))\n", "\n", " X_train = X[indexes[:num_train]]\n", " Y_train = Y[indexes[:num_train]]\n", "\n", " X_test = X[indexes[num_train:]]\n", " Y_test = Y[indexes[num_train:]]\n", " \n", " classifier = SVM(X_train, Y_train)\n", " classifier.load_test_data(X_test, Y_test)\n", " \n", " regularizer = L2Regularizer(np.power(10., reg))\n", " np.random.seed(42)\n", " w0 = np.random.randn(3, )\n", "\n", " opts = {'eta0': eta0,\n", " 'n_iter': n_iter,\n", " 'batch_size': batch_size,\n", " 'n_samples': X_train.shape[0],\n", " 'algorithm': 'SGD',\n", " 'learning_rate_scheduling': 'AnnealingSVM',\n", " 'reg': regularizer.get_lambda() / batch_size\n", " }\n", " try:\n", " trajectory, indexes = gradient_descent(w0, classifier, regularizer, opts)\n", "\n", " contour_plot = plt.subplot(121)\n", " error_plot = plt.subplot(122)\n", "\n", " opt = {'marker': 'ro', 'fillstyle': 'full', 'label': '+ Train', 'size': 8}\n", " plot_helpers.plot_data(X_train[np.where(Y_train == 1)[0], 0], X_train[np.where(Y_train == 1)[0], 1], fig=contour_plot, options=opt)\n", " opt = {'marker': 'bs', 'fillstyle': 'full', 'label': '- Train', 'size': 8}\n", " plot_helpers.plot_data(X_train[np.where(Y_train == -1)[0], 0], X_train[np.where(Y_train == -1)[0], 1], fig=contour_plot, options=opt)\n", "\n", " opt = {'marker': 'ro', 'fillstyle': 'none', 'label': '+ Test', 'size': 8}\n", " plot_helpers.plot_data(X_test[np.where(Y_test == 1)[0], 0], X_test[np.where(Y_test == 1)[0], 1], fig=contour_plot, options=opt)\n", " opt = {'marker': 'bs', 'fillstyle': 'none', 'label': '- Test', 'size': 8}\n", " plot_helpers.plot_data(X_test[np.where(Y_test == -1)[0], 0], X_test[np.where(Y_test == -1)[0], 1], fig=contour_plot, options=opt)\n", "\n", " contour_opts = {'n_points': 100, 'x_label': '$x$', 'y_label': '$y$', 'sgd_point': True, 'n_classes': 4}\n", " error_opts = {'epoch': 5, 'x_label': '$t$', 'y_label': 'error'}\n", "\n", " opts = {'contour_opts': contour_opts, 'error_opts': error_opts}\n", " plot_helpers.classification_progression(X, Y, trajectory, indexes, classifier, \n", " contour_plot=contour_plot, error_plot=error_plot, \n", " options=opts)\n", " except KeyboardInterrupt:\n", " pass\n", "interact(change_learning_params, reg=reg_widget, eta0=lr_widget, n_iter=n_iter_widget,\n", " batch_size=batch_size_widget, noise=noise_widget);" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "scrolled": true }, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.6" }, "pycharm": { "stem_cell": { "cell_type": "raw", "metadata": { "collapsed": false }, "source": [] } }, "widgets": { "state": { "1dbcd274dcfe470b91708912892ed69f": { "views": [ { "cell_index": 18 } ] }, "2cfdd1a03116452aa5629c3d90011059": { "views": [ { "cell_index": 13 } ] }, "4e37636106d041fab7a6ff19b8cb36c7": { "views": [ { "cell_index": 15 } ] }, "674c84c930124a6f95d134bf447ea1c1": { "views": [ { "cell_index": 22 } ] }, "85cd0d5f4f1e4750ab3b284e95bc5242": { "views": [ { "cell_index": 24 } ] }, "a44fb013255548a7b20262200fd53b86": { "views": [ { "cell_index": 20 } ] } }, "version": "1.2.0" } }, "nbformat": 4, "nbformat_minor": 2 }